Why?
The Web has lots of stuff
- frontier beyond curated datasets
- stuff is wrapped in HTML
- HTML is transported over HTTP but composed for h2m consumption
- Intellectual Property rights bear serious consideration
API
Application Program Interface
- Built for machine-to-machine interactions
- Instructions for programs
Client / Server
- Make [R] interface with the web
- Same as h2m but now m2m
JSON
- Javascript Object Notation is a language-independent data format
- Currently the most common data data format for asynchronous client/server communication format
- Consists of key-value pairs
# from https://en.wikipedia.org/wiki/JSON
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
"children": [],
"spouse": null
}
Example
Demonstration
library(jsonlite)
# https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
# for building tibbles
library(tidyverse)
Single JSON array
When the server response is a single JSON array, JSONlite makes viewing the data pretty simple.
oneJSONresult <- fromJSON("http://www.omdbapi.com/?t=rocky&y=&plot=full&r=json")
Let’s see the results in the next slide
oneJSONresult
$Title
[1] "Rocky"
$Year
[1] "1976"
$Rated
[1] "PG"
$Released
[1] "03 Dec 1976"
$Runtime
[1] "120 min"
$Genre
[1] "Drama, Sport"
$Director
[1] "John G. Avildsen"
$Writer
[1] "Sylvester Stallone"
$Actors
[1] "Sylvester Stallone, Talia Shire, Burt Young, Carl Weathers"
$Plot
[1] "Rocky Balboa is a struggling boxer trying to make the big time, working as a debt collector for a pittance. When heavyweight champion Apollo Creed visits Philadelphia, his managers want to set up an exhibition match between Creed and a struggling boxer, touting the fight as a chance for a \"nobody\" to become a \"somebody\". The match is supposed to be easily won by Creed, but someone forgot to tell Rocky, who sees this as his only shot at the big time."
$Language
[1] "English"
$Country
[1] "USA"
$Awards
[1] "Won 3 Oscars. Another 16 wins & 21 nominations."
$Poster
[1] "https://images-na.ssl-images-amazon.com/images/M/MV5BMTY5MDMzODUyOF5BMl5BanBnXkFtZTcwMTQ3NTMyNA@@._V1_SX300.jpg"
$Metascore
[1] "N/A"
$imdbRating
[1] "8.1"
$imdbVotes
[1] "387,927"
$imdbID
[1] "tt0075148"
$Type
[1] "movie"
$Response
[1] "True"
The vector object behaves as you would expect in R.
- You can list all the variable names.
names(oneJSONresult)
[1] "Title" "Year" "Rated" "Released" "Runtime" "Genre" "Director" "Writer" "Actors"
[10] "Plot" "Language" "Country" "Awards" "Poster" "Metascore" "imdbRating" "imdbVotes" "imdbID"
[19] "Type" "Response"
- List an individual element
oneJSONresult$Title
[1] "Rocky"
oneJSONresult$Awards
[1] "Won 3 Oscars. Another 16 wins & 21 nominations."
A JSON Matrix
The results of this code-snippet react differently between the console, the Notebook script (console), and the Notebook HTML output. In the Notebook script-output you can find the component name, in this case dollar-search: $Search. Or, you can use bracket notation: [[1]]. Once you identify the component name, it’s easier to identify the element names.
jsonSeriesResutlsMatrix <- fromJSON("http://www.omdbapi.com/?s=rocky&type=series&r=json&page=1")
jsonSeriesResutlsMatrix
$Search
$totalResults
[1] "20"
$Response
[1] "True"
Call the search results and coerce the JSON array into a data frame.
jsonSeriesResutlsMatrix$Search
jsonSeriesResutlsMatrix$Search$Title
[1] "Rocky and His Friends" "Dr. Jeff: Rocky Mountain Vet" "Rocky Jones, Space Ranger" "Rocky Mountain Law"
[5] "Rocky King, Detective" "Rocky Road" "Rocky Mountain Bounty Hunters" "Rocky + Drago"
[9] "Rocky Point" "Rocky Star"
Resources
- RStudio httR video
- JSONlite package
- listof images
- Movies of 1976
---
title: "Using R to Orchestrate APIs"
author: "John Little"
date: '`r Sys.Date()`'
output:
  slidy_presentation: default
  html_notebook: default
---
## Using R to Orchestrate APIs

A presentation for [Research Data at the Edge](http://library.duke.edu/edge/events/rc17), Day One of [Duke Research Computing Symposium](https://rc.duke.edu/symposium-2017/)

Hosted by the [Data & Visualization Services](http://library.duke.edu/data/) Department.  

### The Files
- github Repo -- https://github.com/libjohn/r-api-json 
- Slides -- https://libjohn.github.com/rcs2017/slides.html
- Notebook -- http://libjohn.github.io/rcs2017/notebook.html 

The presentation materials were composed in *Rmarkdown* via *Rstudio*, stored in a *Github Repository*, Slides & Notebook served via *Github Pages*.  



## Outline

* API
* JSON
* R / RStudio

## Why?

### The Web has lots of stuff
+ frontier beyond curated datasets
+ stuff is wrapped in HTML
+ HTML is transported over HTTP but composed for h2m consumption
+ Intellectual Property rights bear serious consideration

<!-- NASA animated GIF ///  http://i.giphy.com/l2Jht4lIfEQfJ3zj2.gif    --> 
<!--  good human handshake ///  http://giphy.com/gifs/thomas-U2XboRuN89Idi -->
<!-- after the research handshake is complete /// http://giphy.com/gifs/80s-1980s-thomas-dolby-wCKmBd7oNtA4g  --> 
<!-- the confusion of the m2m handshake ///   http://giphy.com/gifs/thomas-MjkCYjM46NrrO -->

## API

### Application Program Interface 

* Built for machine-to-machine interactions
* Instructions for programs

<!-- http://mobile-gps.net/2015/01/ -->
![](images/api.png)


---    

### Client / Server 


![](images/Client-server-model.svg.png) 

* Make [R] interface with the web
* Same as h2m but now m2m


<!-- https://pixabay.com/en/client-server-networking-laptop-341420/ -->
---  

### Human Simulation

#### A dramatization...

* Person uses Web Client
    + Person enters a URL<br>
    ![](images/URL.PNG)
    
    + client & server negotiate<br> 
    ![dramatization: good handshake](images/good-handshake.gif) 
    + Information is sent back in wrapped HTML
    + Web Browser parses the HTML 
    
<!-- https://commons.wikimedia.org/wiki/File:Uniform_Resource_Locator_(URL)_example.PNG -->
<!-- https://commons.wikimedia.org/wiki/File:HTML.svg -->

## m2m -- development


![dramatization: confused about the protocol](images/development-confusion.gif)
    
## JSON

* [Javascript Object Notation](https://en.wikipedia.org/wiki/JSON) is a language-independent data format
* Currently the most common data data format for asynchronous client/server communication format
* Consists of key-value pairs

<!-- http://i.vimeocdn.com/video/541935816_1280x720.jpg -->
<!-- Vimeo on What is JSON // https://vimeo.com/144162102 -->


```{json example}
# from https://en.wikipedia.org/wiki/JSON
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    },
    {
      "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
  "spouse": null
}
```


## Example

### To Follow Along
1. Open an RStudio Docker Container - https://vm-manage.oit.duke.edu/containers/rstudio 
2. Project > New Project
3. Version Contrl > Git 
4. Repository URL = https://github.com/libjohn/r-api-json.git > Create Project 
5. Open *API-JSON-Symposium.Rmd* file
    + Run All
    + GoTo Line 150-ish ("### Demonstration") 

--- 

### OMDB api 

- http://www.omdb.org/
    - like http://imdb.com/
- no API keys requried
- http://www.omdbapi.com/

--- 

### Demonstration


```{r load-library-package, message=FALSE, warning=TRUE}
library(jsonlite)
# https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html

# for building tibbles
library(tidyverse)
```


### Single JSON array
When the server response is a single JSON array, JSONlite makes viewing the data pretty simple.
```{r singleJSONresult}
oneJSONresult <- fromJSON("http://www.omdbapi.com/?t=rocky&y=&plot=full&r=json")
```

Let's see the results in the next slide

---

```{r}
oneJSONresult
```


--- 

##### The vector object behaves as you would expect in R.  

- You can list all the variable names.

```{r}
names(oneJSONresult)
```

- List an individual element


```{r}
oneJSONresult$Title
```

```{r}
oneJSONresult$Awards
```


---

### A JSON Matrix
The **results of this code-snippet react differently** between the *console*, the *Notebook script* (console), and the *Notebook HTML* output.  In the Notebook script-output you can find the component name, in this case dollar-search: `$Search`.  Or, you can use bracket notation: `[[1]]`.  Once you identify the component name, it's easier to identify the element names.
```{r}
jsonSeriesResutlsMatrix <- fromJSON("http://www.omdbapi.com/?s=rocky&type=series&r=json&page=1")
jsonSeriesResutlsMatrix
```

---  

### Call the search results and coerce the JSON array into a data frame.
```{r}
jsonSeriesResutlsMatrix$Search
```

--- 
```{r}
jsonSeriesResutlsMatrix$Search$Title
```


## R Packages -- Related

*People who use JSONlite also use...*

* [httR](https://cran.r-project.org/web/packages/httr/) -- calls JSONlite in service to major goal of orchestrating HTTP (web scraping)
* [rvest](https://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/) --  used for HTML parsing

## Resources 

- RStudio httR video
- JSONlite package
- listof images
- Movies of 1976
    - [OMDB Top Movies](http://www.omdb.org/encyclopedia/year/1976/statistics)
    - [IMDB Most Popular](http://www.imdb.com/year/1976/)

